DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
MassiveSumm: a very large-scale, very multilingual, news summarisation dataset ...
Abstract: Anthology paper link: https://aclanthology.org/2021.emnlp-main.797/ Abstract: Current research in automatic summarisation is unapologetically anglo-centred - a persistent state-of-affairs, which also predates neural net approaches. High-quality automatic summarisation datasets are notoriously expensive to create, posing a challenge for any language. However, with digitalisation, archiving, and social media advertising of newswire articles, recent work has shown how, with careful methodology application, large-scale datasets can now be simply gathered instead of written. In this paper, we present a large-scale multi-lingual summarisation dataset containing articles in 92 languages, spread across 28.8 million articles, in more than 35 writing scripts. This is both the largest, most inclusive, exist- ing automatic summarisation dataset, as well as one of the largest, most inclusive, ever published datasets for any NLP task. We present the first investigation on the efficacy of resource building from news ...
URL: https://dx.doi.org/10.48448/8thm-zg55
https://underline.io/lecture/37700-massivesumm-a-very-large-scale,-very-multilingual,-news-summarisation-dataset
BASE
Hide details
2
The Danish Gigaword Project ...
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern